Intelligent Power Capability
Deep Power Down Technology
Deep Power Down technology is exclusive to mobile CPUs and we have briefly detailed
the new C6 state. The Penryn DPD consists of four major parts:
- the 8KB per-core SRAM state storage that is independently powered and error corrected to keep the data valid - this is needed because in the C6 state both L1 and L2 cache are flushed and powered down;
- The microcode that manages the state storage and subsequent core synchronisation;
- The State Definitions that include architectural states but not all micro-architectural states and no temporary registers used by the uCode ROM;
- The actual power management unit that manages the power up/down sequence and its coordination with the rest of the system - it supports the handshake with the platform and chipset and managed the sequence of events between voltage regulation, PLL and clock.
By going to C6, all the micro-architectural state is put into storage and the voltage is dropped to a point where there is virtually no leakage from the die and the only power is to the 16KB of SRAM and the IO ring that needs to remain on to provide an interface to the chipset to know when to wake up - overall the power usage is about a few hundred milliwatts.
These enhancements are allowed on a die package basis only, not on a core basis, meaning a mobile dual-core Penryn processor will only go into C6 if the system is left completely idle and doesn't need to use CPU clock cycles. The real advantage comes when you move to a quad-core Penryn-based mobile chip. This could happily exist with one die churning away doing pretty intensive tasks that doesn't require more than two threads to run at one.
For example, simple web browsing/music listening or general productivity is in this area, with the second pair of cores sitting in a Deep Power Down C6 state effectively using next-to-no power, prolonging battery life as if you were using a dual-core. However, should you need to do something particularly intensive, the extra cores would kick in and you'd suddenly have "double" the computing power at your disposal.
It is, however, very,
very much dependent on the BIOS and OS knowing where to direct threads. If you've got a system that is directing a couple of jobs to cores one and three, that's both dies being used for nominal tasks all of the time, however if these realise that there are two separate dies and one can be completely shut down - cores one and two get priority for example - then it could work effectively.
The silicon is essentially the same on mobile parts as it is on desktop though, it's just a mobile chipset and BIOS APCI modes need to be configured to make use of it.
Santa Rosa is designed to make use of it - there's an already existing handshake between current chipsets and these processors, but if you're upgrading a notebook make sure it has a BIOS update to properly recognise the new CPUs. It's not just about dropping in a new chip and making it recognise the new CPUID, the chipset needs to have its APCI modes reassigned and its C-state timings adjusted otherwise it won't properly use the new power down technology efficiently.
Enhanced Dynamic Acceleration Technology
The concept of Enhanced Dynamic Acceleration Technology (EDAT) or "Turbo mode" as
bit-tech was told it is internally known at Intel, is to use the available power headroom of an Idle core to boost the power of another core. When one of the two cores enters an C3 or lower idle power state and the other core is continually executing a single thread, the core in use gets a voltage and frequency boost but the CPU still remains within the rated TDP so it requires no extra cooling.
So why not in desktop or server? Well those systems aren't constrained by such extreme thermal parameters but they are constrained more by core voltage (which is typically higher for these parts than mobile anyway) which impedes on CPU lifetime and wear-rate - something overclockers are particularly aware of. It's designed so that it doesn't matter if the other core comes out of a DPD C-state momentarily to do something menial, the extra heat output is negligible as long as it drops back down again to C3 or lower. This means the intense single threaded application the other core is churning through will continue to run at the higher frequency unaffected.
Extended Power Down States for Harpertown Xeons
Server power consumption is a
hot topic these days, with any power saving offering up some considerable reduction in electric and cooling bill depending on the size of the farm. Since retaining cache coherency in
snoop cycles can use up to "30 percent" extra power on otherwise "CC1" idle cores, this new "CC3" state for the Harpertown server platform can reduce traffic and power usage.
Because the L2 cache is shared between the cores, the L1 cache is copied (or "flushed") to L2 where the chipset now just noses in order to see what's there in respect to main memory and other CPUs. It means the cores can remain idle as the L2 cache has to be kept constantly powered anyway. Actual performance loss for repopulating the L1 cache is negligible.
Want to comment? Please log in.